Upgrade masters last when upgrading ES clusters #8871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

naemono wants to merge 37 commits into elastic:main from naemono:fix-sts-upgrade-issue-recreation

+718 −24

Contributor

naemono commented Oct 22, 2025 •

edited

Loading

Fixes #8429

What is changing?

This ensure that the master StatefulSets are always upgraded last when doing a version upgrade of Elasticsearch.

Validation

Manual testing
Unit test for behavior
e2e test validating behavior


          Upgrade masters last when upgrading ES clusters

9db32d0

Signed-off-by: Michael Montgomery <[email protected]>

Collaborator

prodsecmachine commented Oct 22, 2025 •

edited

Loading

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

botelastic bot added the triage label

naemono added 2 commits

October 23, 2025 14:03


          Fix lint issue

39b2702

Signed-off-by: Michael Montgomery <[email protected]>


          Add e2e test for upgrade order.

50b3954

Signed-off-by: Michael Montgomery <[email protected]>

Contributor Author

naemono commented Oct 29, 2025

buildkite test this -f p=kind,t=TestNonMasterFirstUpgradeComplexTopology -m s=9.1.2

naemono added 5 commits

October 28, 2025 21:00


          unexport things in e2e tests

00555c2

Signed-off-by: Michael Montgomery <[email protected]>


          Also look at the current/target version while determining whether sts is

88cb347

updated.

Signed-off-by: Michael Montgomery <[email protected]>


          Fix tests

790d3f1

Signed-off-by: Michael Montgomery <[email protected]>


          Merge branch 'fix-sts-upgrade-issue-recreation' of github.com:naemono…

4b944d1

…/cloud-on-k8s into fix-sts-upgrade-issue-recreation


          Fix the unit tests for master last upgrades

d9885ba

Signed-off-by: Michael Montgomery <[email protected]>

Contributor Author

naemono commented Oct 29, 2025

buildkite test this -f p=kind,t=TestHandleUpscaleAndSpecChanges_VersionUpgradeDataFirstFlow -m s=9.1.2

naemono added 5 commits

October 29, 2025 18:03


          fix linter

efa8643

Signed-off-by: Michael Montgomery <[email protected]>


          move closer to use.

Signed-off-by: Michael Montgomery <[email protected]>


          Ensure requeue

2dc664b

Signed-off-by: Michael Montgomery <[email protected]>


          adjust comments

46c726c

Signed-off-by: Michael Montgomery <[email protected]>


          Adjust logging in e2e test

fccf6c3

Signed-off-by: Michael Montgomery <[email protected]>

naemono changed the title ~~WIP: Upgrade masters last when upgrading ES clusters~~ Upgrade masters last when upgrading ES clusters

naemono marked this pull request as ready for review

October 29, 2025 23:21

naemono requested review from barkbay, Copilot, kvalliyurnatt, pebrc and rhr323

October 29, 2025 23:21

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull Request Overview

This PR implements a non-master-first upgrade strategy for Elasticsearch clusters. The key change ensures that during version upgrades, non-master nodes (data, ingest, coordinating nodes) are upgraded before master nodes, which helps maintain cluster stability during upgrades.

Adds logic to separate master and non-master StatefulSets during version upgrades
Implements upgrade order validation to ensure non-master nodes complete their upgrades first
Adds comprehensive unit and e2e tests to verify the upgrade flow

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
pkg/controller/elasticsearch/driver/upgrade.go	Adds check to identify new clusters vs upgrades by checking if status version is empty
pkg/controller/elasticsearch/driver/upscale.go	Implements non-master-first upgrade logic with resource separation and upgrade status checking
pkg/controller/elasticsearch/driver/upscale_test.go	Adds comprehensive unit test for version upgrade flow and minor formatting fixes
test/e2e/es/non_master_first_upgrade_test.go	Adds e2e test that validates non-master-first upgrade behavior with a watcher

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/e2e/es/non_master_first_upgrade_test.go Outdated Show resolved Hide resolved

test/e2e/es/non_master_first_upgrade_test.go Show resolved Hide resolved

naemono added 2 commits

October 30, 2025 15:21


          Don't compare masters against other masters or themselves.

8feef24

Signed-off-by: Michael Montgomery <[email protected]>


          Fix spelling

0f5a31a

Signed-off-by: Michael Montgomery <[email protected]>

barkbay added the >bug label

botelastic bot removed the triage label

barkbay added the v3.3.0 (next) label

naemono added 3 commits

November 17, 2025 13:51


          Attempt allow replicas update master (wip)

16fd9ec

Signed-off-by: Michael Montgomery <[email protected]>


          Fix the code allowing replica updates during upgrades.

Signed-off-by: Michael Montgomery <[email protected]>


          Add comment to test

11cc0e6

Signed-off-by: Michael Montgomery <[email protected]>

naemono requested a review from barkbay

November 18, 2025 00:43

Contributor

barkbay commented Nov 26, 2025

I'll take another look today, sorry for the lag.

pebrc reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Outdated Show resolved Hide resolved

pebrc reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Outdated Show resolved Hide resolved

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Outdated Show resolved Hide resolved

pebrc reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Outdated Show resolved Hide resolved

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Outdated Show resolved Hide resolved

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Outdated Show resolved Hide resolved

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go

    
              			pendingNonMasterSTS = append(pendingNonMasterSTS, actualStatefulSet)

              			continue

              		}

Contributor

barkbay Nov 26, 2025

We should not rely on the status until the sts controller has observed the new generation.

Suggested change

      
            		if actualStatefulSet.Status.ObservedGeneration < actualStatefulSet.Generation {
          
            			// The StatefulSet controller has not yet observed the latest generation.
          
            			pendingNonMasterSTS = append(pendingNonMasterSTS, actualStatefulSet)
          
            			continue
          
            		}

Contributor

barkbay Nov 26, 2025

or:

use actualStatefulSets.PendingReconciliation() before that loop

create a common function to reuse the logic in

cloud-on-k8s/pkg/controller/elasticsearch/driver/expectations.go

Lines 23 to 46 in cb0d001

    
           func (d *defaultDriver) expectationsSatisfied(ctx context.Context) (bool, string, error) { 
        
           	log := ulog.FromContext(ctx) 
        
           	// make sure the cache is up-to-date 
        
           	expectationsOK, reason, err := d.Expectations.Satisfied() 
        
           	if err != nil { 
        
           		return false, "", err 
        
           	} 
        
           	if !expectationsOK { 
        
           		log.V(1).Info("Cache expectations are not satisfied yet, re-queueing", "namespace", d.ES.Namespace, "es_name", d.ES.Name, "reason", reason) 
        
           		return false, reason, nil 
        
           	} 
        
           	actualStatefulSets, err := sset.RetrieveActualStatefulSets(d.Client, k8s.ExtractNamespacedName(&d.ES)) 
        
           	if err != nil { 
        
           		return false, "", err 
        
           	} 
        
           	// make sure StatefulSet statuses have been reconciled by the StatefulSet controller 
        
           	pendingStatefulSetReconciliation := actualStatefulSets.PendingReconciliation() 
        
           	if len(pendingStatefulSetReconciliation) > 0 { 
        
           		log.V(1).Info("StatefulSets observedGeneration is not reconciled yet, re-queueing", "namespace", d.ES.Namespace, "es_name", d.ES.Name) 
        
           		return false, fmt.Sprintf("observedGeneration is not reconciled yet for StatefulSets %s", strings.Join(pendingStatefulSetReconciliation.Names().AsSlice(), ",")), nil 
        
           	} 
        
           	// make sure pods have been reconciled by the StatefulSet controller 
        
           	return actualStatefulSets.PodReconciliationDone(ctx, d.Client) 
        
           }

pebrc reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Outdated Show resolved Hide resolved

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale_test.go Outdated Show resolved Hide resolved

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale_test.go Outdated Show resolved Hide resolved

pebrc reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go Show resolved Hide resolved

naemono added 3 commits

November 30, 2025 19:35


          Apply review comments

645e088

Signed-off-by: Michael Montgomery <[email protected]>


          More review comments

3e95b2d

Signed-off-by: Michael Montgomery <[email protected]>


          Ensure the StatefulSet controller has observed the latest generation.

7f00388

Signed-off-by: Michael Montgomery <[email protected]>

naemono requested review from barkbay and pebrc

December 1, 2025 13:20

Contributor

barkbay commented Dec 2, 2025 •

edited

Loading

I will take another look at it today.

Edit: was not able to make it today, will do tomorrow first thing

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go

    
              		targetReplicas := sset.GetReplicas(res.StatefulSet)

              		if actualReplicas < targetReplicas {

              			actualSset.Spec.Replicas = ptr.To(targetReplicas)

Contributor

barkbay Dec 3, 2025

I think we should use UpdateReplicas(...) instead of updating directly Replicas, to also update common.k8s.elastic.co/template-hash

Suggested change

      
            			actualSset.Spec.Replicas = ptr.To(targetReplicas)
          
            			nodespec.UpdateReplicas(&actualSset, ptr.To[int32](targetReplicas))

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go

    
              		if actualReplicas < targetReplicas {

              			actualSset.Spec.Replicas = ptr.To(targetReplicas)

              			if err := ctx.k8sClient.Update(ctx.parentCtx, &actualSset); err != nil {

Contributor

barkbay Dec 3, 2025

Should we not use es_sset.ReconcileStatefulSet() instead of calling k8sClient.Update(..) directly? It already includes the call to expectations.ExpectGeneration(reconciled).

Contributor

barkbay Dec 3, 2025

And maybe a follow-up question is should we update UpscaleResults with the result of es_sset.ReconcileStatefulSet() to work with a consistent view in the rest of the driver logic.

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go

    
              	// The only adjustment we want to make to master statefulSets before ensuring that all non-master

              	// statefulSets have been reconciled is to potentially scale up the replicas

              	// which should happen 1 at a time as we adjust the replicas early.

              	if err = maybeUpscaleMasterResources(ctx, masterResources); err != nil {

Contributor

barkbay Dec 3, 2025

Suggested change

      
            	if err = maybeUpscaleMasterResources(ctx, masterResources); err != nil {
          
            	if err := maybeUpscaleMasterResources(ctx, masterResources); err != nil {

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go

    
              		// Read the current StatefulSet from k8s to get the latest state

              		var actualSset appsv1.StatefulSet

              		if err := ctx.k8sClient.Get(ctx.parentCtx, k8s.ExtractNamespacedName(&res.StatefulSet), &actualSset); err != nil {

              			if apierrors.IsNotFound(err) {

Contributor

barkbay Dec 3, 2025

I was trying to understand in which cases we can get a 404 IIUC one of them is when the user is attempting to scale up the masters with a new nodeset? Let's maybe add a godoc to explain that we are only scaling up existing master statfulsets, new ones are ignored.

barkbay reviewed

View reviewed changes

pkg/controller/elasticsearch/driver/upscale.go

    
              	// The only adjustment we want to make to master statefulSets before ensuring that all non-master

              	// statefulSets have been reconciled is to potentially scale up the replicas

              	// which should happen 1 at a time as we adjust the replicas early.

              	if err = maybeUpscaleMasterResources(ctx, masterResources); err != nil {

Contributor

barkbay Dec 3, 2025

I just realized that calling this when len(nonMasterResources) == 0 (or more generally, when all non-master nodesets have already been upgraded?) can be slightly suboptimal.

Assuming that the initial state is:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-sample
spec:
  version: 9.1.0
  nodeSets:
  - name: default
    config:
      node.roles: ["master", "data", "ingest", "ml"]
      node.store.allow_mmap: false
    count: 3

If we update and upgrade to:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-sample
spec:
  version: 9.1.2
  nodeSets:
  - name: default
    config:
      node.roles: ["master", "data", "ingest", "ml"]
      node.store.allow_mmap: false
    count: 4

Then we are going to scale up the 9.1.0 statefulset, leading to the creation of elasticsearch-sample-es-default-3, but immediately in the next reconciliation we are going to delete elasticsearch-sample-es-default-3 to upgrade it to 9.1.2

Contributor

barkbay Dec 3, 2025

My previous comment made me wonder if !isVersionUpgrade is actually the only reason we might want to reconcile everything at once.

Contributor

barkbay commented Dec 3, 2025

buildkite test this -f p=kind,t=TestNonMasterFirstUpgradeComplexTopology -m s=8.15.2

barkbay reviewed

View reviewed changes

test/e2e/es/non_master_first_upgrade_test.go

    
              		func(k *test.K8sClient, t *testing.T) {

              			statefulSets, err := essset.RetrieveActualStatefulSets(k.Client, k8s.ExtractNamespacedName(&es))

              			if err != nil {

              				t.Logf("failed to get StatefulSets: %s", err.Error())

Contributor

barkbay Dec 3, 2025

Will this test fail if we consistently get an error here? (my feeling is that it's not going to be the case because violations is always empty in that case, but maybe I'm missing something)

barkbay reviewed

View reviewed changes

Contributor

barkbay left a comment

Almost LGTM, I think we need to adjust the way we scale the master nodes, also the e2e test seems broken (we create the data integrity index with no replicas, which should fail during a rolling upgrade), and may not be accurate in case of errors.

test/e2e/es/non_master_first_upgrade_test.go

Comment on lines +96 to +100

    
              	mutated := initial.WithNoESTopology().

              		WithVersion(dstVersion).

              		WithESMasterNodes(3, elasticsearch.DefaultResources).

              		WithESDataNodes(2, elasticsearch.DefaultResources).

              		WithESCoordinatingNodes(1, elasticsearch.DefaultResources)

Contributor

barkbay Dec 3, 2025 •

edited

Loading

Without WithMutatedFrom the data integrity index has no replicas, which means that the cluster is going to be red as soon as one Pod is recreated, and the test is going to fail.

Suggested change

      
            	mutated := initial.WithNoESTopology().
          
            		WithVersion(dstVersion).
          
            		WithESMasterNodes(3, elasticsearch.DefaultResources).
          
            		WithESDataNodes(2, elasticsearch.DefaultResources).
          
            		WithESCoordinatingNodes(1, elasticsearch.DefaultResources)
          
            	mutated := initial.WithVersion(dstVersion).WithMutatedFrom(&initial)

Edit: confirmed by the e2e test I started in my previous buildkite comment

{
    "Time": "2025-12-03T09:12:11.122554149Z",
    "Action": "output",
    "Package": "github.com/elastic/cloud-on-k8s/v3/test/e2e/es",
    "Test": "TestNonMasterFirstUpgradeComplexTopology/Elasticsearch_cluster_health_should_not_have_been_red_during_mutation_process",
    "Output": "{\"log.level\":\"error\",\"@timestamp\":\"2025-12-03T09:12:11.117Z\",\"message\":\"continuing with additional tests\",\"service.version\":\"0.0.0-SNAPSHOT+00000000\",\"service.type\":\"eck\",\"ecs.version\":\"1.4.0\",\"error\":\"test Elasticsearch cluster health should not have been red during mutation process failed\",\"error.stack_trace\":\"github.com/elastic/cloud-on-k8s/v3/test/e2e/test.StepList.RunSequential\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/step.go:52\ngithub.com/elastic/cloud-on-k8s/v3/test/e2e/test.RunMutationsWhileWatching\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/run_mutation.go:77\ngithub.com/elastic/cloud-on-k8s/v3/test/e2e/es.runNonMasterFirstUpgradeTest\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/es/non_master_first_upgrade_test.go:76\ngithub.com/elastic/cloud-on-k8s/v3/test/e2e/es.TestNonMasterFirstUpgradeComplexTopology\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/es/non_master_first_upgrade_test.go:102\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1792\"}\n"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

barkbay barkbay left review comments

Copilot code review Copilot Copilot left review comments

rhr323 Awaiting requested review from rhr323

kvalliyurnatt Awaiting requested review from kvalliyurnatt

pkoutsovasilis Awaiting requested review from pkoutsovasilis

pebrc Awaiting requested review from pebrc

At least 1 approving review is required to merge this pull request.

Labels

>bug v3.3.0 (next)